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1 Multigrain shared memory 82% 

□ft Donald Yeung , John Kubiatowicz , Anant Agarwal 
— 1 ACM Transactions on Computer Systems (TOCS) May 2000 
Volume 18 Issue 2 

Parallel workstations, each comprising tens of processors based on shared memory, 
promise cost-effective scalable multiprocessing. This article explores the coupling of 
such small- to medium-scale shared-memory multiprocessors through software over a 
local area network to synthesize larger shared-memory systems. We call these 
systems Distributed Shared-memory Multiprocessors (DSMPs). This article introduces 
the design of a shared-memory system that uses multiple granularities of sharing, 
ca ... 



2 Architecture and design of AlphaServer GS320 

Cft Kourosh Gharachorloo , Madhu Sharma , Simon Steely , Stephen Van Doren 
— Proceedings of the ninth international conference on Architectural support for 
programming languages and operating systems November 2000 
Volume 28 , 34 Issue 5 , 5 

This paper describes the architecture and implementation of the AlphaServer GS320, a 
cache-coherent non-uniform memory access multiprocessor developed at Compaq. 
The AlphaServer GS320 architecture is specifically targeted at medium-scale 
multiprocessing with 32 to 64 processors. Each node in the design consists of four 
Alpha 21264 processors, up to 32GB of coherent memory, and an aggressive IO 
subsystem. The current implementation supports up to 8 such nodes for a total of 32 
processors. While s ... 



3 Architecture and design of AlphaServer GS320 80% 

Cft Kourosh Gharachorloo , Madhu Sharma , Simon Steely , Stephen Van Doren 
1 ACM SIGPLAN Notices November 2000 
Volume 35 Issue 11 

This paper describes the architecture and implementation of the AlphaServer GS320, a 
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Alpha 21264 processors, up to 32GB of coherent memory, and an aggressive 10 
subsystem. The current implementation supports up to 8 such nodes for a total of 32 
processors. While s ... 

4 The Stanford FLASH multiprocessor 80% 

Jeffrey Kuskin , David Ofelt , Mark Heinrich , John Heinlein , Richard Simoni , K. 
L — 1 Gharachorloo , J. Chapin , D. Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. 
Rosenblum , J. Hennessy 

25 years of the international symposia on Computer architecture (selected 
papers) August 1998 

5 The Stanford FLASH multiprocessor 80% 
J. Kuskin , D. Ofelt , M. Heinrich , J. Heinlein , R. Simoni , K. Gharachorloo , J. Chapin , D. 

— Nakahira , J. Baxter , M. Horowitz , A. Gupta , M. Rosenblum , J. Hennessy 

ACM SIGARCH Computer Architecture News , Proceedings of the 21ST annual 
international symposium on Computer architecture April 1994 
Volume 22 Issue 2 

The FLASH multiprocessor efficiently integrates support for cache-coherent shared 
memory and high-performance message passing, while minimizing both hardware and 
software overhead. Each node in FLASH contains a microprocessor, a portion of the 
machine's global memory, a port to the interconnection network, an I/O interface, and 
a custom node controller called MAGIC. The MAGIC chip handles all communication 
both within the node and among nodes, using hardwired data paths for efficient data 
moveme ... 

6 Computing curricula 2001 80% 

Journal on Educational Resources in Computing (JERIC) September 2001 
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Distributed file systems: concepts and examples 77% 

Eliezer Levy , Abraham Silberschatz 
ACM Computing Surveys (CSUR) December 1990 
Volume 22 Issue 4 

The purpose of a distributed file system (DFS) is to allow users of physically 
distributed computers to share data and storage resources by using a common file 
system. A typical configuration for a DFS is a collection of workstations and 
mainframes connected by a local area network (LAN). A DFS is implemented as part of 
the operating system of each of the connected computers. This paper establishes a 
viewpoint that emphasizes the dispersed structure and decentralization of both data 
and con ... 

8 Fast detection of communication patterns in distributed executions 77% 

Thomas Kunz , Michiel F. H. Seuren 
— Proceedings of the 1997 conference of the Centre for Advanced Studies on 
Collaborative research November 1997 

Understanding distributed applications is a tedious and difficult task. Visualizations 
based on process-time diagrams are often used to obtain a better understanding of 
the execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very 
complex and do not provide the user with the desired overview of the application. In 
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our experience, such tools display repeated occurrences of non-trivial commun ... 

9 Mark Illfp hypercube concurrent processor architecture 77% 

J. Tuazon , J. Peterson , M. Pniel 
— Proceedings of the third conference on Hypercube concurrent computers and 

applications: Architecture, software, computer systems, and general issues - 

Volume 1 January 1988 

The Mark Illfp Hypercube is a new generation of hypercube concurrent processor 
system developed at JPL/Caltech, with peak performance of 5 Mips, 14 Mflops per 
node, and a peak communication rate of 6 Mbytes per second. Each node utilizes two 
Motorola MC68020 microprocessors, an MC68882 scalar floating- point coprocessor, 
and a Weitek 8000 floating-point chip set. One of the MC68020 processors serves as 
the application and computational processor, the other is dedicated to communication. 
The ... 



10 Accelerating shared virtual memory via general-purpose network 77% 
2) interface support 

Angelos Bilas , Dongming Jiang , Jaswinder Pal Singh 

ACM Transactions on Computer Systems (TOCS) February 2001 

Volume 19 Issue 1 

Clusters of symmetric multiprocessors (SMPs) are important platforms for high- 
performance computing. With the success of hardware cache-coherent distributed 
shared memory (DSM), a lot of effort has also been made to support the coherent 
shared-address-space programming model in software on clusters. Much research has 
been done in fast communication on clusters and in protocols for supporting software 
shared memory across them. However, the performance of software virtual memory 
(SVM) is sti ... 



11 Piranha: a scalable architecture based on single-chip multiprocessing 77% 

Cft Luiz Andre Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz 
— Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese 

ACM SIGARCH Computer Architecture News , Proceedings of the 27th annual 

international symposium on Computer architecture May 2000 

Volume 28 Issue 2 

The microprocessor industry is currently struggling with higher development costs and 
longer design times that arise from exceedingly complex processors that are pushing 
the limits of instruction-level parallelism. Meanwhile, such designs are especially ill 
suited for important commercial applications, such as on-line transaction processing 
(OLTP), which suffer from large memory stall times and exhibit little instruction-level 
parallelism. Given that commercial applications constitute by fa ... 



12 Adaptive, fine-grained sharing in a client-server OODBMS: a callback- 77% 
U based approach 

Markos Zaharioudakis , Michael J. Carey , Michael J. Franklin 

ACM Transactions on Database Systems (TODS) December 1997 

Volume 22 Issue 4 

For reasons of simplicity and communication efficiency, a number of existing object- 
oriented database management systems are based on page server architectures; data 
pages are their minimum unit of transfer and client caching. Despite their efficiency, 
page servers are often criticized as being too retrictive when it comes to concurrency, 
as existing systems use pages as the minimum locking unit as well. In this paper we 
show how to support object-level locking in a page-server context. Sev ... 
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13 SoftFLASH: analyzing the performance of clustered distributed virtual 77% 
shared memory 

Andrew Erlichson , Neal Nuckolls , Greg Chesson , John Hennessy 

Proceedings of the seventh international conference on Architectural support for 

programming languages and operating systems September 1996 

Volume 31 , 30 Issue 9 , 5 

One potentially attractive way to build large-scale shared-memory machines is to use 
small-scale to medium-scale shared-memory machines as clusters that are 
interconnected with an off-the-shelf network. To create a shared-memory 
programming environment across the clusters, it is possible to use a virtual shared- 
memory software layer. Because of the low latency and high bandwidth of the 
interconnect available within each cluster, there are clear advantages in making the 
clusters as large as possi ... 

14 Verification techniques for cache coherence protocols 77% 

f^h Fong Pong , Michel Dubois 
1—1 ACM Computing Surveys (CSUR) March 1997 
Volume 29 Issue 1 

In this article we present a comprehensive survey of various approaches for the 
verification of cache coherence protocols based on state enumeration, (symbolic model 
checking, and symbolic state models. Since these techniques search the state space of 
the protocol exhaustively, the amount of memory required to manipulate that state 
information and the verification time grow very fast with the number of processors and 
the complexity of the protocol mechanism ... 



15 Scheduler-conscious synchronization 77% 

□fa Leonidas I. Kontothanassis , Robert W. Wisniewski , Michael L. Scott 
— ACM Transactions on Computer Systems (TOCS) February 1997 
Volume 15 Issue 1 

Efficient synchronization is important for achieving good performance in parallel 
programs, especially on large-scale multiprocessors. Most synchronization algorithms 
have been designed to run on a dedicated machine, with one application process per 
processor, and can suffer serious performance degradation in the presence of 
multiprogramming. Problems arise when running processes block or, worse, busy-wait 
for action on the part of a process that the scheduler has chosen not to run. We 
show ... 

16 SM-prof: a tool to visualise and find cache coherence performance 77% 
12 bottlenecks in multiprocessor programs 

Mats Brorsson 

ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 1995 ACM 
SIGMETRICS joint international conference on Measurement and modeling of 
computer systems May 1995 
Volume 23 Issue 1 

Cache misses due to coherence actions are often the major source for performance 
degradation in cache coherent multiprocessors. It is often difficult for the programmer 
to take cache coherence into account when writing the program since the resulting 
access pattern is not apparent until the program is executed. SM-prof is a performance 
analysis tool that addresses this problem by visualising the shared data access pattern 
in a diagram with links to the source code lines causing performance degrad ... 
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□h Michael J. Carey , Michael J. Franklin , Markos Zaharioudakis 
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ACM SIGMOD Record , Proceedings of the 1994 ACM SIGMOD international 
conference on Management of data May 1994 
Volume 23 Issue 2 

For reasons of simplicity and communication efficiency, a number of existing object- 
oriented database management systems are based on page server architectures; data 
pages are their minimum unit of transfer and client caching. Despite their efficiency, 
page servers are often criticized as being too restrictive when it comes to concurrency, 
as existing systems use pages as the minimum locking unit as well. In this paper we 
show how to support object-level locking in a page server context. Se ... 



18 The Wisconsin Wind Tunnel: virtual prototyping of parallel computers 

Steven K. Reinhardt , Mark D. Hill , James R. Larus , Alvin R. Lebeck , James C. Lewis , 
David A. Wood 

ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 1993 ACM 
SZGMETRICS conference on Measurement and modeling of computer systems 

June 1993 
Volume 21 Issue 1 
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